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(54) Image processing apparatus 

(57) Image data from a plurarrty of cameras 2-1 , 2-2, 
2-3 showing the movements of a number of people, for 
example in a meeting, and sound data from a directional 
microphone array 4 is processed by a computer 
processing apparatus 24 to archive the data in a meet- 
ing archive database 60. The image data is processed 
to determine the three-dimensional position and orien- 
tation of each person's head and to detemiine at whom 
each person is looking. The sound cteta is processed to 
determine the direction from which the sound came. 
Processing is earned out to detemiine who is speaking 
by detemnining whk^h person has his head in a position 



corresponding to the direction from which the sound 
came. Having determined which person is speaking, the 
personal speech recognition parameters for that person 
are selected and used to convert the sound data to text 
data. Image data to be archived is chosen by selecting 
the camera whteh best shows the speaking partk:ipant 
and the partbipant to whom he is speaking. Image data, 
sound data, text data and data defining at whom each 
person is looking is stored in the meeting archive data- 
base 60. 
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Description 

[0001] The present invention relates to the archiving of image data. 

[0002] Many databases exist for the storage of inrtage data. However, problenis exist because, inter alia, the amount 
of image data to be stored can be large, and because ways in which the database can be interrogated to retrieve infor- 
mation therefrom are limited. 

[0003] TTie present invention has been made with this in mind. 

[0004] According to the present invention, there is provided an apparatus or method in which processing is per- 
formed to archive image data from a plurality of cameras which shows people talking. The person speaking and the per- 
son (or object) at whom he is looking are determined, and a subset of the image data is selected to be archived in 
dependence thereon. 

[0005] In this way, it is not necessary to store the inrtage data from all of the cameras, thereby reducing storage 
requirements. 

[0006] TTie present invention also provides an apparatus or method for selecting Image data from among Image 
data recorded by a plurality of cameras whch shows people talking, in which the position in three dimensions of at least 
the head of the person who is speaking and the person (or object) at whom he is looking are determined by processing 
at least some of the image data, and the selection of image data is made based on the determined positions and the 
views of the cameras. 

[0007] The present invention forther provides instructions, including in signal and recorded form, for configuring a 
programmable processing apparatus to become arranged as an apparatus, or to become operable to perform a 
method, in such a system. 

[0008] Embodiments of the invention will now be described, by way of example only, with reference to the acconv 
panying drawings, in which: 

Rgure 1 Illustrates the recording of sound and video data from a meeting between a plurality of participants; 

Rgure 2 is a block diagram showing an example of notional functional components within a processing apparatus 
in an embodiment; 

Rgure 3 shows the processing operations performed by processing apparatus 24 in Rgure 2 prior to the meeting 
shown in Rgure 1 between the participants starting; 

Rgure 4 schematically illustrates the data stored in meeting archive database 60 at step S2 and step S4 in Rgure 
3; 

Rgure 5 shows the processing operations perfomned at step S34 in Rgure 3 and step S70 in Rgure 7; 

Rgure 6 shows the processing operations perfomned at each of steps S42-1 , S42-2 and S42-n in Rgure 5; 

Rgure 7 shows the processing operations performed by processing apparatus 24 in Rgure 2 while the meeting 
between the partk;ipants is taking place; 

Rgure 8 shows the processing operations performed at step S72 in Rgure 7; 
Rgure 9 shows the processing operations peribmried at step S80 in Rgure 8; 

Rgure 10 illustrates the viewing ray for a participant used in the processing perfomned at step S1 14 and step SI 24 
in Rgure 9; 

Rgure 1 1 illustrates the angles calculated in the processing performed at step S1 14 in Rgure 9; 

Rgure 1 2 shows the processing operations performed at step S84 in Rgure 8; 

Rgure 13 shows the processing operations performed at step S89 in Rgure 8; 

Rgure 14 shows the processing operations perfomned at step SI 68 in Rgure 13; 

Rgure 15 schematteally illustrates the storage of information in the meeting archive database 60; 
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Figures 1 6A and 1 6B show examples of viewing histograrris defined by data stored in the meeting archive database 
60; 

Rgure 1 7 shows the processing operations performed at step SI 02 in Rgure 8; 

Rgure 18 shows the processing operations performed by processing apparatus 24 to retrieve infonnation from the 
meeting archive database 60; 

Rgure 1 9A shows the information displayed to a user at step S200 in Rgure 1 8; 

Figure 1 9B shows an example of information displayed to a user at step S204 in Rgure 1 8; and 

Rgure 20 schematically illustrates an embodiment in which a single database stores information from a plurality of 
meetings and is inten-ogated from one or more remote apparatus. 

[0009] Referring to Rgure 1, a plurality of video cameras (three in the example shown in Rgure 1, although this 
number may be d'rfferent) 2-1 , 2-2, 2-3 and a microphone array 4 are used to record image data and sound data respec- 
tively from a meeting taking place between a group of people 6, 8, 10, 12. 

[0010] The microphone array 4 comprises an array of microphones arranged such that the direction of any incom- 
ing sound can be determined, for example as described in GB-A-21 40558, US 4333170 and US 3392392. 
[0011] The image data from the video cameras 2-1, 2-2, 2-3 and the sound data from the microphone array 4 is 
input via cables (not shown) to a computer 20 which processes the received data and stores data in a database to cre- 
ate an archive record of the meeting from which information can be subsequently retrieved. 

[0012] Computer 20 comprises a conventional personal computer having a processing apparatus 24 containing, in 
a conventional manner, one or more processors, memory, sound card etc., together with a display device 26 and user 
input devices, which, in this embodiment, contprise a keyboard 28 and a mouse 30. 

[001 3] The components of computer 20 and the input and output of data therefrom are schematically shown in Rg- 
ure 2. 

[0014] Refening to Rgure 2, the processing apparatus 24 is programmed to operate in accordance with program- 
ming instructions input, for example, as data stored on a data storage nr^edium, such as disk 32, and/or as a signal 34 
input to the processing apparatus 24, for example from a remote database, by transmission over a communication net- 
woric (not shown) such as the Internet or by transmission through the atmosphere, and/or entered by a user via a user 
input device such as keyboard 28 or other input device. 

[0015] When programmed by the programming instructions, processing apparatus 24 effectively becomes config- 
ured into a number of functional units for perforriiing processing operations. Examples of such functional units and their 
interconnections are shown in Figure 2. The illustrated units and interconnections in Rgure 2 are, however, notional and 
are shown for illustration purposes only, to assist understanding; they do not necessarily represent the exact units and 
connections into which the processor, memory etc of the processing apparatus 24 become configured. 
[0016] Refening to the functional units shown in Rgure 2, a central controller 36 processes inputs from the user 
input devices 28, 30 and receives data Input to the processing apparatus 24 by a user as data stored on a storage 
device, such as disk 38, or as a signal 40 transmitted to the processing apparatus 24. The central controller 36 also pro- 
vides control and processing for a number of the other functional units. Memory 42 is provided for use by central con- 
troller 36 and other functional units. 

[0017] Head tracker 50 processes the image data received from video cameras 2-1 , 2-2, 2-3 to track the position 
and orientation in three dimensions of the head of each of the participants 6, 8, 10, 12 in the meeting. In this embodi- 
ment, to perfbnn this tracking, head tracker 50 uses data defining a three-dimensional computer model of the head of 
each of the partk:ipants and data defining features thereof, which is stored in head model store 52, as will be described 
below. 

[0018] Direction processor 53 processes sound data from the mterophone array 4 to detemnine the direction or 
directions from which the sound recorded by the mterophones was received. Such processing is performed in a con- 
ventional manner, for example as described in GB-A-21 40558, US 4333170 and US 3392392. 
[0019] Voice recognition processor 54 processes sound data received from mterophone anay 4 to generate text 
data therefrom. More particularly, vok» recognition processor 54 operates in accordance with a conventional voice rec- 
ognition program, such as "Dragon Dfctate" or IBM "ViaVoice", to generate text data contending to the words spoken 
by the participants 6, 8, 10, 12. To perform the voice recognition processing, voice recognition processor 54 uses data 
defining the speech recognition paranneters for each partfcipant 6, 8, 10, 12, which is stored in speech recognition 
parameter store 56. More partbularty, the data stored in speech recognition parameter store 56 comprises c^ta defin- 
ing the voice profile of each participant which is generated by training the voice recognition processor in a conventional 
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manner. For example, the data comprises the data stored in the "user files" of Dragon Dictate after training. 
[0020] Archive processor 58 generates data for storage in meeting archive ctetabase 60 using data received from 
head tracker 50, direction processor 53 and voice recognition processor 54. More particularty, as will be described 
below, video data from cameras 2-1 , 2-2 and 2-3 and sound data from microphone array 4 is stored in meeting archive 
database 60 together with text data from voice recognition processor 54 and data defining at whom each participant in 
the meeting was looking at a given time. 

[0021] Text searcher 62, in conjunction with central controller 36, is used to search the meeting archive database 
60 to find and replay the sound and video data for one or more parts of the meeting which meet search criteria specified 
by a user, as will be described in further detail below. 

[0022] Display processor 64 under corrtrol of central controller 36 displays infbmriation to a user via display devfee 
26 and also replays sound and video data stored in meeting archive database 60. 

[0023] Output processor 66 outputs part or all of the data from archive database 60, for example on a storage 
devk:e such as disk 68 or as a signal 70. 

[0024] Before beginning the meeting, it is necessary to initialise computer 20 by entering data which is necessary 
to enable processing apparatus 24 to perfonn the required processing operations. 

[0025] Rgure 3 shows the processing operations performed by processing apparatus 24 during this initialisation, 
[0026] Refening to Rgure 3, at step S1 , central controller 36 causes display processor 64 to display a message on 
display device 26 requesting the user to input the names of each person who will parttoipate in the meeting. 
[0027] At step S2, upon receipt of data defining the names, for example input by the user using keyboard 28. central 
controller 36 allocates a unique identification nunnber to each participant, and stores data, for example table 80 shown 
in Rgure 4, defining the relationship between the identification numbers and the participants' names in the meeting 
archive database 60. 

[0028] At step S3, central controller 36 causes display processor 64 to display a nnessage on display devk^e 26 
requesting the user to Input the name of each object at whk:h a person may look for a signrfk^ant amount of time during 
the meeting, and for which it is desired to store archive data in the meeting archive database 60. Such objects may 
include, for example, a flip chart, such as the flip chart 1 4 shown in Rgure 1 , a whiteboard or blackboard, or a television, 
etc. 

[0029] At step S4, upon receipt of data defining the names of the objects, for example input by the user using key- 
board 28, central controller 36 allocates a unique identification number to each object, and stores data, for exaniple as 
in table 80 shown in Rgure 4, defining ttie relationship between the identification numbers and the names of the objects 
in the meeting archive database 60. 

[0030] At step S6, centiBi controller 36 searches the head model store 52 to detemriine whether data defining a 
head model is already stored for each participant in the meeting. 

[0031 ] If it is detemiined at step S6 that a head model is not already stored for one or more of the partk^ipants, then, 
at step S8, cental controller 36 causes display processor 64 to display a message on display device 26 requesting the 
user to input data defining a head model of each partidpant for whom a model is not already stoned. 
[0032] In response, the user enters data, for example on a storage medium such as disk 38 or by downloading the 
data as a signal 40 from a connected processing apparatus, defining the required head models. Such head models may 
be generated in a conventional manner, for example as described in "An Analysis/Synthesis Cooperation for Head 
Tracking and Video Face Cloning" by Valente et al in Proceedings ECCV '98 Woricshop on Perception of Human Action, 
University of Freiberg, Gemiany, June 6 1998. 

[0033] At step SI 0, central controller 36 stores the data input by the user in head model store 52. 
[0034] At step SI 2, central controller 36 and display processor 64 render each three-dimensional computer head 
model input by the user to display the model to the user on display device 26, together with a message requesting the 
user to identify at least seven features in each model. 

[0035] In response, the user designates using mouse 30 points in each model which correspond to prominent fea- 
tures on the front, sides and, if possible, the back, of the participants head, such as the corners of eyes, nostrils, mouth, 
ears or features on glasses worn by the participant, etc. 

[0036] At step 81 4, data defining the features identified by the user is stored by central controller 36 in head model 
store 52. 

[0037] On the other hand, if it is determined at step S6 that a head model is already stored in head model store 52 
for each participant, then steps S8 to SI 4 are omitted. 

[0038] At step S16, central controller 36 searches speech recognition parameter store 56 to determine whether 
speech recognition parameters are already stored for each partfcipant 

[0039] If it is determined at step S16 that speech recognition parameters are not available for all of the participants, 
then, at step SI 8, central controller 36 causes display processor 64 to display a message on display device 26 request- 
ing the user to input the speech recognition parameters for each participant for whom the parameters are not already 
stored. 
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[0040] In response, the user enters data, for example on a storage medium such as disk 38 or as a signal 40 from 
a remote processing apparatus, defining the necessary speech recognition parameters. As noted aljove. these param- 
eters define a profile of the user's speech and are generated by training a voice recognition processor in a conventional 
manner. Thus for example, in the case of a voice recognrtion processor comprising Dragon Dictate, the speech recog- 
5 nition parameters input by the user correspond to the parameters stored in the "user files" of Dragon Dictate. 

[0041 J At step S20, data defining the speech recognrtion parameters input by the user is stored by central controller 
36 in the speech recognrtion parameter store 56. 

[0042] On the other hand, if it is determined at step SI 6 that the speech recognrtion parameters are already avail- 
able for each of the participants, then steps S1 8 and S20 are omitted. 

10 [0043] At step S22, central controller 36 causes display processor 64 to display a message on display device 26 
requesting the user to perform steps to enable the cameras 2-1 , 2-2 and 2-3 to be calibrated. 
[0044] In response, the user canies out the necessary steps and, at step S24, central controller 36 performs 
processing to calibrate the cameras 2-1 , 2-2 and 2-3. More part'icularfy, in this embodiment, the steps performed by the 
user and the processing performed by central controller 36 are canied out in a manner such as that described in "Cal- 

15 ibrating and 3D Modelling wrth a Multi-Camera System" by Wiles and Davison in 1999 IEEE Woricshop on Multi-View 
Modelling and Analysis of Visual Scenes, ISBN 0769501 109. This generates calibration data defining the position and 
orientation of each camera 2-1, 2-2 and 2-3 with respect to the meeting room and also the intrinsic parameters of each 
camera (aspect ratio, focal length, principal point, and first order radial distortion coefficient). The camera calibration 
data is stored, for example In memory 42. 

20 [0045] At step S25, central controller 36 causes display processor 64 to display a message on display device 26 
requesting the user to perform steps to enable the position and orientation of each of the objects for which identification 
data was stored at step S4 to be detemnined. 

[0046] In response, the user carries out the necessary steps and, at step S26, central controller 36 perfomns 
processing to detemiine the position and orientation of each object More particularly, in this embodiment, the user 

25 places coloured markers at points on the perimeter of the surtiace(s) of the object at which the participants in the meet- 
ing may look, for example the plane of the sheets of paper of fiip chart 1 4. Image data recorded by each of cameras 2- 
1 , 2-2 and 2-3 is then processed by central controller 36 using the camera calibration data stored at step S24 to deter- 
mine, in a conventional manner, the position in three-dimensions of each of the coloured maricers. Thfe processing is 
performed for each camera 2-1 , 2-2 and 2-3 to grve separate estimates of the position of each coloured mariner, and an 

30 average is then detemiined for the posrtion of each marker from the positions cafoulated using data from each camera 
2-1, 2-2 and 2-3. Using the average position of each marker, central controller 36 cateulates in a conventional manner 
the centre of the object surface and a surface normal to define the orientation of the object surface. The detemnined 
posrtion and orientation for each object is stored as object calibration data, for example in memory 42. 
[0047] At step S27, central controller 36 causes display processor 64 to display a message on display device 26 

35 requesting the next participant in the meeting (this being the first participant the first time step S27 is perfomied) to sit 
down. 

[0048] At step S28, processing apparatus 24 waits for a predetermined period of time to give the requested partic- 
ipant time to sit down, and then, at step S30, central controller 36 processes the respective Image data from each cam- 
era 2-1 , 2-2 and 2-3 to detemriine an estinrtate of the posrtion of the seated partb'ipanf s head for each camera. More 

40 particularly, in this embodiment, central controller 36 carries out processing separately for each camera in a conven- 
tional manner to identify each portion in a frame of image data from the camera whk^h has a colour corresponding to 
the colour of the skin of the partfcipant (this colour being determined from the data defining the head model of the par- 
ticipant stored in head model store 52), and then selects the portion which corresponds to the highest posrtion in the 
meeting room (since rt is assumed that the head will be the highest skin-coloured part of the body). Using the position 

45 of the identified portion in the image and the camera calibration parameters determined at step S24, central controller 
36 then detemnines an estimate of the three-dimensional posrtion of the head in a conventional manner. This process- 
ing is perfomied for each camera 2-1 . 2-2 and 2-3 to give a separate head position estimate for each camera. 
[0049] At step S32, central controller 36 determines an estimate of the orientation of the participant's head in three 
dimensions for each camera 2-1 , 2-2 and 2-3. More particularty, in this embodiment, central controller 36 renders the 

50 three-dimensional computer model of the participant's head stored in head model store 52 for a plurality of different ori- 
entations of the model to produce a respective two-dimensional image of the model for each orientation. In this embod- 
iment, the computer model of the participant's head is rendered in 108 different orientations to produce 108 respective 
two-dimensional images, the orientations conesponding to 36 rotations of the head model in 1 0** steps for each of three 
head inclinations con^esponding to 0*» (looking straight ahead), +45** (looking up) and -45<* (looking down). Each two- 

55 dimensional image of the model is then compared by central processor 36 wrth the part of the video frame from a cam- 
era 2-1 , 2-2, 2-3 whfch shows the participant's head, and the orientation for whfch ttie image of tiie model best matches 
the video image data is selected, this comparison and selection being performed for each camera to give a head orien- 
tation estimate for each camera. When comparing the image data produced by rendering the head model with the video 
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data from a camera, a conventional technique is used, for example as descn*bed in •Head Tracking Using a Textured 
Polygonal Model" by Schddl, Haro & Essa in Proceedings 1998 Workshop on Perceptual User Interfaces. 
[0050] At step S34. the respective estimates of the position of the participant's head generated at step S30 and the 
respective estimates of the orientation of the participant's head generated at step S32 are input to head tracker 50 and 
frannes of image data received from each of cameras 2-1 , 2-2 and 2-3 are processed to track the head of the participant. 
More particularly, in this emk>odiment, head tracker 50 perfomns processing to track the head in a conventional manner, 
for example as described in 'An Analysis/Synthesis Cooperation for Head Tracking and Video Face Cloning" by N^lente 
et al in Proceedings EECV '98 Workshop on Perception of Human Action, University of Freiberg, Gemriany, June 6 
1998. 

[0051] ngure 5 summarises the processing operations perfomned by head tracker 50 at step S34. 
[0052] Refemng to Rgure 5. In each of steps S42-1 to S42-n ("n' being three in this embodiment since there are 
three cameras), head tracker 50 processes image data from a respective one of the cameras recording the meeting to 
determine the positions of the head features of the participant (stored at step SI 4) in the image data from the camera 
and to detemnine therefrom the three-dimensional position and orientation of the partidpants head for the current frame 
of image data from that camera. 

[0053] Rgure 6 shows ttie processing operations perfbmried at a given one of steps S42-1 to S42-n, ttie processing 
operations being the same at each step but being carried out on image data from a different camera. 
[0054] Referring to Figure 6, at step S50, head tracker 50 reads ttie current estimates of the 3b position and orien- 
tation of the participant's head, these being tiie estimates produced at steps S30 and S32 in Rgure 3 the first time step 
S50 is perfomied. 

[0055] At step S52, head tracker 50 uses the camera calibration data generated at step S24 to render the three- 
dimensional computer model of tiie participant's head stored in head model store 52 in accordance witii the estimates 
of position and orientation read at step S50. 

[0056] At step S54, head tracker 50 processes the image data for the cun^nt frame of video data received from the 
camera to extract the image data from each area whbh sun^ounds the expected position of one of the head features 
identified by the user and stored at step SI 4, the expected positions being determined from the estimates read at step 
850 and the camera caltbration data generated at step S24. 

[0057] At step S56, head tracker 50 matches the rendered image data generated at step S52 and the camera 
innage data extracted at step 854 to find the camera image data whk:h best matches the rendered head model. 
[0058] At step S58, head tracker 50 uses tiie camera image data identified at step S56 whteh best matches the ren- 
dered head model together with the camera calibration data stored at step S24 (Rgure 3) to determine the 3D position 
and orientation of the partk^ipanf s head for the current frame of video data. 

[0059] Refemng again to Rgure 5, at step S44, head tracker 50 uses the camera image data identified at each of 
steps S42-1 to S42-n which best matches the rendered head model (identified at step S58 in Rgure 6) to determine an 
average 3D position and orientation of the participant's head for the current frame of video data 
[0060] At the same time that step S44 is perfonmed, at step S46, the positions of the head features in the camera 
innage data determined at each of steps S42-1 to S42-n (identified at step S58 in Rgure 6) are input into a conventional 
Kalman filter to generate an estinnate of the 3D position and orientation of the partteipants head for the next frame of 
video data. Steps S42 to S46 are perfomied repeatedly for the partfcipant as frames of video data are received from 
video camera 2-1 , 2-2 and 2-3. 

[0061] Referring again to Rgure 3, at step S36, central controller 36 determines whether there is another partici- 
pant in the meeting, and steps S27 to S36 are repeated until processing has been performed for each partfcipant in the 
manner descn*bed above. However, while these steps are perfomied for each participant, at step S34, head tracker 50 
continues to track the head of each participant who has already sat down. 

[0062] When it is detennined at step S36 ttiat there are no further participants in the meeting and that accordingly 
the head of each participant is being tracked by head tiBcker 50, then, at step S38, central controller 36 causes an audi- 
ble signal to be output from processing apparatus 24 to indfcate that the nneeting between the participants can begin. 
[0063] Rgure 7 shows the processing operatfcns performed by processing apparatus 24 as the meeting between 
the participants takes place. 

[0064] Refemng to Rgure 7, at step 870, head tracker 50 continues to track the head of each participant in the 
meeting. The processing perfonmed by head tracker 50 at step S70 is the same as that described above with respect 
to step S34, and accordingly will not be described again here. 

[0065] At the same time that head tracker 50 is tracking the head of each participant at step S70, at step S72 
processing is perfonmed to generate and store data in meeting archive database 60. 
[0066] Rgure 8 shows the processing operations performed at step S72. 

[0067] Refemng to Rgure 8, at step S80, archive processor 58 generates a so-called "viewing parameter* for each 
partidpant defining at whfch person or whfch object the participant is looking. 
[0068] Rgure 9 shows the processing operations performed at step S80. 
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[0069] Referring to Rgure 9, at step S1 10, archive processor 58 reads the current three-dimensional position of 
each participant's head from head tracker 50, this being the average position generated in the processing performed by 
head tracker 50 at step S44 (Rgure 5). 

[0070] At step S1 12, archive processor 58 reads the cument orientation of the head of the next participant (this 

5 being the first participant the first time step S112 is performed) from head tracker 50. The orientation read at step S1 1 2 
is the average orientation generated in the processing performed by head tracker 50 at step S44 (Rgure 5). 
[0071] At step S1 14, archive processor 58 detemnines the angle between a ray defining where the participant is 
looking (a so-called "viewing ra/7 and each notional line whk:h connects the head of the partidpant with the centre of 
the head of another partrcipant 

10 [0072] More particularty, refening to Rgures 10 and 1 1, an example of the processing perfonned at step S1 14 is 
illustrated for one of the participants, namely partfcipant 6 in Rgure 1 . Refening to Rgure 1 0, the orientation of the par- 
tbipants head read at step S1 12 defines a viewing ray 90 from a point between the centre of the participants eyes 
which is perpendicular to the participant's head. Similarly, refening to Rgure 11 , the positions of all of the partkHpanfs 
heads read at step S1 1 0 define notional lines 92, 94, 96 from the point between the centre of the eyes of participant 6 

15 to the centre of the heads of each of the other partkripants 8, 1 0, 1 2. In the processing performed at step S1 14, archive 
processor 58 detemnines the angles 98, 1 00, 1 02 between the viewing ray 90 and each of the notional lines 92, 94, 96. 
[0073] Referring a^in to Rgure 9, at step S1 16, archive processor 58 selects the angle 98, 1 00 or 1 02 which has 
the smallest value. Thus, refening to the example shown in Rgure 1 1 , the angle 1 00 would be selected. 
[0074] At step S1 1 8, archive processor 58 determines whether the angle selected at step S1 1 6 has a value less 

20 than 10°. 

[0075] If it is determined at step S1 1 8 that the angle is less than 1 0**, then, at step SI 20, archive processor 58 sets 
the viewing parameter for the partkapant to the identificatton number (allocated at step S2 in Rgure 3) of the partfcipant 
connected by the notional line whfch makes the smallest angle with the viewing ray. Thus, refemng to the example 
shown in Rgure 1 1 , if angle 1 00 is less than 1 0°, then the viewing parameter would be set to the identification number 
25 of participant 1 0 since angle 1 00 is the angle between viewing ray 90 and notional line 94 which connects participant 6 
to partfcipant 1 0. 

[0076] On the other hand, if it is determined at step S1 1 8 that the smallest angle is not less than 10**, then, at step 

SI 22, archive processor 58 reads the position of each object previously stored at step 826 (Rgure 3). 

[0077] At step SI 24, archive processor 58 determines whether the viewing ray 90 of the partfcipant intersects the 

30 plane of any of the objects. 

[0078] If it is determined at step SI 24 that the viewing ray 90 does intersect the plane of an object, then, at step 
SI 26, archive processor 50 sets the viewing parameter for the participant to the identification number (allocated at step 
S4 in Rgure 3) of the object which is intersected by the viewing ray, this being the nearest intersected object to the par- 
tfcipant if more than one object is intersected by the viewing ray 90. 

35 [0079] On the other hand, if it is determined at step SI 24 that the viewing ray 90 does not intersect the plane of an 
object, then, at step SI 28, archive processor 58 sets the value of the viewing paranneter for the partfcipant to "0". This 
indicates that the partfcipant is determined to be looking at none of the other partfcipants (since the viewing ray 90 is 
not close enough to any of the notional lines 92, 94, 96) and none of the objects (since the viewing ray 90 does not inter- 
sect an object). Such a situation could arise, for example, if the participant was lootang at some object in the meeting 

40 room for which data had not been stored at step S4 and whfch had not been calibrated at step S26 (for example the 
notes held by participant 12 in the example shown in Rgure 1). 

[0080] At step SI 30, archive processor 58 determines whether there is another participant in the meeting, and 
steps S1 1 2 to SI 30 are repeated until the processing described above has been carried out for each of the participants. 
[0081 ] Referring again to Rgure 8, at step S82, central controller 36 and voice recognition processor 54 determine 
45 whether any speech data has been received from the mfcrophone array 4 con^esponding to the current frame of video 
data 

[0082] If it is determined at step 582 that speech data has been received, then, at step S84, processing is per- 
formed to determine whfch of the partfcipants in the meeting is speaking. 
[0083] Rgure 1 2 shows the processing operations performed at step S84. 

50 [0084] Refemng to Rgure 1 2, at step SI 40, direction processor 53 processes the sound data from the microphone 
array 4 to determine the direction or directions from whfch the speech is coming. This processing is performed in a con- 
ventional manner, for example as described in GB-A-21 40558, US 4333170 and US 3392392. 
[0085] At step S1 42, archive processor 58 reads the position of each participant's head determined by head tracker 
50 at step S44 (Rgure 5) for the current frame of image data and detemnines therefrom which of the participants has a 

55 head at a position corresponding to a direction detemnined at step SI 40, that is, a direction from which the speech is 
coming. 

[0086] At step SI 44, archive processor 58 detemiines whether there is more than one partfcipant in a direction 
from whfch the speech is coming. 
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[0087] it is determined at step SI 44 that there is only one participant in the direction from which the speech is 
coming, then, at step SI 46, archive processor 58 selects the participant in the direction from which the speech is conr>- 
ing as the speaker for the cun^ent frame of image data. 

[0088] On the other hand, if it is determined at step S144 that there is more than one participant having a head at 
5 a position which conBsponds to the direction from which the speech is coming, then, at step SI 48, archive processor 
58 determines whether one of those participants was identified as the spealcer in the preceding frame of image data 
[0089] If it is determined at step 81 48 that one of the partidpants in the direction from which the speech is coming 
was selected as the speal<er in the preceding frame of Image data, then, at step S150, archive processor 58 selects the 
speaker identified for the previous frame of image cteta as the speaker for the cunrent frame of image data, too. This is 
10 because is it likely that the speaker in the previous frame of image data is the same as the speaker in the cun^nt frame. 
[0090] On the other hand, if it is detenmined at step S148 that none of the partteipants in the direction from whk:h 
the speech is coming is the participant identified as the speaker in the preceding frame, or if no weaker was identified 
for the previous frame, then, at step SI 52, archive processor 58 selects each of the participants in the direction from 
whteh the speech is coming as a "potential' speaking partk:ipant 
15 [0091 ] Referring again to Rgure 8, at step S86, archive processor 58 stores the viewing parameter value for each 
speaking participant, that is the viewing parameter value generated at step S80 defining at whom or what each speak- 
ing partkiipant is looking, for subsequent analysis, for example in memory 42. 

[0092] At step S88, archive processor 58 infomns voice recognition processor 54 of the identity of each speaking 
participant determined at step S84. In response, voice recognition processor 54 selects the speech recognition param- 
20 eters for the speaking partfcipant(s) from speech recognition parameter store 56 and uses the selected parameters to 
perform speech recognition processing on the received speech data to generate text data corresponding to the words 
spoken by the speaking partidpant(s). 

[0093] On the other hand, if rt is determined at step S82 that the received sound data does not contain any speech, 
then steps S84 to S88 are omitted. 

25 [0094] At step S89. archive processor 58 detennines which image data is to be stored in the meeting archive data- 
base 60, that is. the image data from whteh of the cameras 2-1 , 2-2 and 2-3 is to be stored. 
[0095] Rgure 1 3 shows the processing operations perfonmed by archive processor 58 at step S89. 
[0096] Referring to Rgure 1 3, at step SI 60. archive processor 58 detennines whether any speech was detected at 
step SB2 (Rgure 8) for the cun-ent frame of image data 

30 [0097] If it is detennined at step SI 60 that there is no speech for the cun-ent frame, then, at step SI 62, archive proc- 
essor 58 selects a default camera as the camera from which image data is to be stored. More particularly, in this 
embodiment, archive processor 58 selects the camera or canneras from whfch image data was recorded for the previ- 
ous frame, or, if the cunrent frame being processed is the very first frame, then archive processor 58 selects one of the 
cameras 2-1 , 2-2, 2-3 at random. 

35 [0098] On the other hand, if it is determined at step SI 60 that there is speech for the cun-ent frame being processed 
then, at step SI 64. archive processor 58 reads the viewing paranroter previously stored at step S86 for the next speak- 
ing partk:ipant (this being the first speaking participant the first time step SI 64 is perfonned) to determine the person 
or object at whk^h that speaking partk^ipant is looking. 

[0099] At step SI 66. archive processor 58 reads the head position and orientation (detemiined at step S44 in Rg- 
40 ure 5) for the speaking partk^ipant cumentty being considered, together with the head position and orientation of the par- 
trcipant at whrch the speaking partk;ipant is looking (determined at step S44 in Rgure 5) or the position and orientation 
of the object at whrch the speaking partrcipant is looking (stored at step S26 in Rgure 3). 

[0100] At step SI 68 archive processor 58 processes the positions and orientations read at step SI 66 to determine 
whfch of the cameras 2-1 , 2-2, 2-3 best shows both the speaking participant and the partidpant or object at whfch the 
45 speaking partidpant is looking, and selects this camera as a camera from whkih image data for the cun-ent frame is to 
be stored in meeting archive database 60. 

[0101 ] Rgure 1 4 shows the processing operations performed by archive processor 58 at step SI 68. 

[0102] Refem'ng to Rgure 1 4, at step SI 76, archive processor 58 reads the three-dimensional position and viewing 

direction of the next camera (this being the first camera the first time step SI 76 is perfomned), this informatton having 

so previously been generated and stored at step S24 in Rgure 3. 

[0103] At step SI 78. archive processor 58 uses the infomiation read at step SI 76 together with infomnation defin- 
ing the three-dimensional head position and orientation of the speaking partk^pant (detennined at step S44 in Rgure 
5) and the three-dimensional head position and orientation of the partidpant at whom the speaking partidpant is look- 
ing (determined at step S44 in Rgure 5) or the three-dimensional position and orientation of the object being looked at 

55 (stored at step S26 in Rgure 3) to detennine whether the speaking partidpant and the participant or object at which the 
speaking partfcipant is looking are both within the field of view of the camera currently being considered (that is. 
whether the camera currentiy being considered can see both the speaking partkapant and the partfcipant or object at 
whk:h the speaking partcipant is looking). More particularly, in this embodiment, archive processor 58 evaluates the fol- 
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lowing equations and detemilnes that the camera can see both the speaking participant and the participant or object at 
which the speaking participant is looking if all of the inequalities hold: 
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where: 

(Xc, Yc. are the X, y and z coordinates respectively of the principal point of the camera (previously determined 
and stored at step S24 in Rgure 3) 

(dXc, dYc, dZJ represent the viewing direction of the camera in the x, y and z directions respeclivety (again deter- 
mined and stored at step S24 in Rgure 3) 

Oh and % are the angular fields of view of the camera in the horizontal and vertical directions respectively (again 
determined and stored at step S24 in Rgure 3) 

(Xpi, Ypi. Zpi) are the X, y and z coordinates respectively of the centre of the head of the speaking participant 
(determined at step S44 in Rgure 5) 

(dXpi, dVp^, dZpi) represent the orientation of the viewing ray 90 of the speaking participant (again detennined at 
step S44 in Rgure 5) 
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(Xp2, Yp2. Zp2) are the x, y and z coordinates respectively of the centre of the head of the person at whom the 
speaking participant is loolcing (determined at step S44 in Rgure 5) or of the centre of the surface of the object at 
which the speaking partfcipant is looking (determined at step S26 in Rgure 3) 



(dXp2, dYp2. dZpa) represent the direction in the x, y and z directions respectively of the viewing ray 90 of the par- 
tfcipant at whom the speaking participant is looking (again detemnined at step S44 in Rgure 5) or of the normal to 
the object surface at whch the speaking participant is looking (determined at step S26 in Rgure 3). 



[0104] [f it is detennined at step SI 78 that the camera can see both the speaking partfcipant and the participant or 
object at which the speaking partrcipant is looking (that is. the inequafities in each of equations (1 ). (2), (3) and (4) above 
hold), then, at step S1 80, archive processor 58 cateulates and stores a value representing the quality of the view that 
the camera cun-ently being considered has of the speaking partteipant More particularly, in this embodiment, archive 
processor 58 cabulates a quality value, Q1 , using the following equation: 
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where the definitions of the terms are the same as those given for equations (1) and (2) above. 
[0105] TTie quanty value, Q1 , calculated at step S1 80 is a scalar, having a value between -1 and -i-l . with the value 
being -1 if the back of the speaking partfcipants head is directly facing the camera, +1 if the face of the speaking par- 
tte'^ant is directly facing the camera, and a value in-between for other orientations of the speaking partk^pants head. 
[0108] At step S1 82, archive processor 58 cak^ulates and stores a value representing the quality of the view that 
the camera cunnently being considered has of the participant or object at whfeh the speaking participant is looking. 
[0107] More partrcularly, in this embodiment, archive processor 58 calculates a quality value, 02. using the follow- 
ing equation: 



Q2 = 
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where the definitions of the parameters are the same as those given for equations (3) and (4) above. 
[0108] Again. Q2 is a scalar having a value t>etween -1 If the back of the head of the partrcipant or the back of the 
surface of the object is directly facing the camera, +1 if the face of the participant or the front surface of the object is 
directly fadng the camera, and values therebetween for other orientations of the participant's head or object surface. 
[0109] At step S1 84. archive processor 58 compares the quality value Q1 cateulated at step S180 with the quality 
value Q2 calculated at step Sli32. and selects the lowest value. This lowest value Indk^tes the "worst view" that the 
camera has of the speaking participant or the partk^ipant or object at which the speaking participant is looking, (the 
worst view being that of the speaking participant if Ql is less than Q2, and that of the partbipant or object at whch the 
speaking participant is looking if 02 is less than 01). 

[0110] On the other hand, if it is determined at step S178 that one or more of the equalities in equations (1), (2), (3) 
and (4) does not hold (that is, the camera can not see both the speaking participant and the participant or object at 
which the speaking participant is looking), then, steps SI 80 to SI 84 are omitted. 

[01 1 1 ] At step SI 86, archive processor 58 determines whether there is another camera from which image data has 
been received Steps S176 to SI 86 are repeated until the processing described above has been performed for each 
camera. 
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[01 1 2] At step S1 88, archive processor 58 compares the "worst view" values stored for each of the cameras when 
processing was performed at step SI 84 (that is. the value of Q1 or Q2 stored for each camera at step SI 84) and selects 
the highest one of these stored values. This highest value represents the "best worst view* and accordingly, at step 
81 88, archive processor 58 selects the camera for which this 'best worst view" value was stored at step 81 84 as a cam- 

5 era from which image data should be stored in the meeting archive database, since this camera has the best view of 
both the speaking participant and the participant or object at which the speaking participant is looking. 
[0113] At step S170, archive processor 58 detemnines whether there is another speaking partfcipant, including any 
•potential" speaking partfcipants. Steps SI 64 to 8170 are repeated until the processing described above has been per- 
formed for each speaking participant and each "potential" speaking partbipant. 

10 [0114] Referring again to Rgure 8, at step 890, archive processor 58 encodes the cun-ent frame of video data 
received from the camera or cameras selected at step 889 and the sound data received from microphone anay 4 as 
MPEG 2 data in a conventional manner, and stores the encoded data in meeting archive database 60. 
[0115] Rgure 15 schematfcally illustrates the storage of data in meeting archive database 60. The storage structure 
shown in Rgure 15 is notional and is provided to assist understanding by Illustrating the links between the stored Infor- 

15 mation; it does not necessarily represent the exact way in whfch data is stored in the memory comprising meeting 
archive datat)ase 60. 

[0116] Refening to Rgure 15. meeting archive database 60 stores time information represented by the horizontal 
axis 200, on whfch each unrt represents a predetemiined amount of time, for example the time period of one frame of 
video data received from a camera. (It will, of course, be appreciated that the meeting archive database 60 will generally 

20 contain many more time un'rts than the number shown in Rgure 15.) The MPEG 2 data generated at step 890 is stored 
as data 202 in meeting archive database 60, together with timing information (this timing information being schemati- 
cally represented in Rgure 15 by the position of the MPEG 2 data 202 along the horizontal axis 200). 
[0117] Refening again to Rgure 8. at step 892, archive processor 58 stores any text data generated by vofce rec- 
ognrtion processor 54 at step 888 for the current frame in meeting archive database 60 (Indcated at 204 in Rgure 1 5). 

25 More partfcularty. the text data is stored witii a link to the con-esponding MPEG 2 data, this link being represented in 
Rgure 1 5 by the text data being stored in the same vertical column as the MPEG 2 data. 

[0118] As will be appreciated, there will not be any text data for storage from partfcipants who are not speaking. In 
the example shown in Rgure 15, text is stored forthe first ten time slots for participant 1 (indicated at 206), for the twelfth 
to twentieth time slots for partfcipant 3 (indicated at 208). and for the twenty-first time slot for partfeipant 4 (indfcated at 
30 21 0). No text is stored for partfcipant 2 since, in this example, participant 2 did not speak during the time slots shown in 
Rgure 15. 

[0119] At step S94, archive processor 58 stores the viewing parameter value generated for the current frame for 
each participant at step 880 in the meeting archive database 60 (indicated at 212 in Rgure 15). Refemng to Rgure 15, 
a viewing parameter value is stored for each participant together with a link to the associated M PEG 2 data 202 and tiie 

35 associated text data 204 (this link being represented in Rgure 15 by the viewing parameter values being shown in the 
same column as the associated MPEG 2 data 202 and associated text data 204). Thus, refemng to the first time slot in 
Rgure 15 by way of example, the viewing parameter value for partfcipant 1 is 3, indteating tiiat partfcipant 1 is looking 
at participant 3, the viewing parameter value for participant 2 is 5, indicating that participant 2 is looking at the flip chart 
1 4, the viewing parameter value for participant 3 is 1 , indfcating that participant 3 is looking at participant 1 , and the 

4o viewing parameter value for participant 4 is "0", indfeating that participant 4 is not looking at any of the other participants 
(in the example shown in Rgure 1, the partcipant indicated at 12 is looking at her notes rather than any of the other 
partidpants). 

[0120] At step 896, central controller 36 and archive processor 58 determine whether one of the participants in the 
meeting has stopped speaking. In this embodiment, this check is performed by examining the text data 204 to deter- 
45 mine whether text data for a given participant was present for the previous time slot, but is not present for the cun-ent 
time slot. If this condition is satisfied for any participant (tfiat is, a participant has stopped speaking), then, at step 898, 
archive processor 58 processes the viewing parameter values previously stored when step 886 was performed for each 
participant who has stopped speaking (these viewing parameter values defining at whom or what the partteipant was 
looking during the period of speech whfch has now stopped) to generate data defining a viewing histogram. More par- 
se? tteularty, the viewing parameter values for the period In which the partfcipant was speaking are processed to generate 
data defining the percentage of time during that period that the speaking participant was looking at each of the other 
partidpants and objects. 

[01 21 ] Rgures 1 6A and 1 6B show the viewing histograms corresponding to the periods of text 206 and 208 respec- 
tively in Rgure 15. 

55 [01 22] Refemng to Rgure 1 5 and Rgure 1 6A, during the period 206 when partidpant 1 was speaking, he was look- 
ing at participant 3 for six of the ten time slots (that is. 60% of the total length of the period for whfch he was talking), 
which is indrcated at 300 in Rgure 1 6A, and at partfctpant 4 for four of the ten time slots (that is. 40% of the time), whfch 
is indicated at 31 0 in Rgure 1 6A. 
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[0123] Similarty, referring to Rgure 1 5 and Rgure 1 6B. during the period 208, participant 3 was looking at partici- 
pant 1 for approximately 46% of the time, which is indicated at 320 in Rgure 1 6B, at object 5 (that is. the flip chart 1 4) 
for approximately 33% of the time, indicated at 330 in Rgure 1 6B, and at participant 2 for approximately 22% of the time, 
which is indicated at 340 in Rgure 16B. 

5 [0124] Refem'ng again to Rgure 8, at step SI 00, each viewing histogram generated at step S98 is stored in the 
meeting archive database 60 linked to the associated period of text for whteh it was generated. Referring to Rgure 1 5, 
the stored viewing histograms are indicated at 21 4. with the data defining the histogram for the text period 206 indfcated 
at 216, and the data defining the histogram for the text period 208 indfcated at 218. In Rgure 15, the link between the 
viewing histogram and the associated text is represented by the viewing histogram being stored in the same columns 

10 as the text cteta. 

[0125] On the other hand, if it is detemnined at step S96 that, for the cument time period, one of the partidpants has 
not stopped speaking, then steps S98 and SI 00 are omitted. 

[0126] At step SI 02. archive processor 58 con-ects data stored in the meeting archive database 60 for the previous 
franne of video data (that is. the frame preceding the frame for which data has just been generated and stored at steps 

15 S80 to SI 00) and other preceding frames, if such correction is necessary. 

[0127] Rgure 1 7 shows the processing operations performed by archive processor 58 at step SI 02. 
[0128] Referring to Rgure 17, at step SI 90, archive processor 58 determines whether any data for a •potential" 
speaking partfcipant is stored in the meeting archive database 60 for the next preceding frame (this being the frame 
whfeh immediately precedes the current frame the first time step SI 90 is performed, that is the 1-1 "th frame if the cur- 

20 rent frame is the'ith frame). 

[0129] If it is detennined at step SI 90 that no data is stored for a "potential" speaking participant for the preceding 
frame being considered, then it is not necessary to cored any c^ta in the meeting archive database 60. 
[0130] On the other hand, if it is determined at step SI 90 that data for a "potential" speaking participant is stored 
for the preceding frame being considered, then, at step SI 92, archive processor 58 determines whether one of the 

25 "potential" speaking participants for whfch data was stored for the preceding frame is the same as a speaking partfci- 
pant (but not a "potential" speaking participant) identified for the cunent frame, that is a speaking partfcipant identified 
. at step SI 46 in Rgure 12. 

[0131] If it is determined at step SI 92 that none of the "potential" speaking participants for the preceding frame is 
the same as a speaking participant identified at step SI 46 for the current frame, then no correction of the data stored 

30 in the meeting archive database 60 for the preceding frame being considered is carried out 

[0132] On the other hand, if it is determined at step SI 92 that a "potential" speaking participant for the preceding 
frame is the same as a speaking participant identified at step SI 46 for the cunrent frame, then, at step SI 94, archive 
processor 58 deletes the text data 204 for the preceding frame being considered from the meeting archive database 60 
for each "potential" speaking participant who is not the same as the speaking participant for the current frame. 

35 [0133] By perfomiing the processing at steps SI 90, SI 92 and SI 94 as described above, when a speaker is posi- 
tively identified by processing image and sound data for the cun-ent fi^e, then data stored for the previous frame for 
"potential" speaking participants (that is, because it was not possible to unambiguously identify the speaker) is updated 
using the assumption that the speaker in the current fi^ame is the same as ttie speaker in the preceding frame. 
[0134] After step SI 94 has been performed, steps SI 90 to SI 94 are repeated for the next preceding frame. More 

40 particulariy, if the current frame is ttie "i"tti frame then, the "i-1 "th frame is considered the first time steps SI 90 to SI 94 
are perfomned, the ■i-2"th frame is considered the second time steps SI 90 to SI 94 are perfomied, etc. Steps SI 90 to 
SI 94 continue to be repeated until it is detennined at step SI 90 that data for "potential" speaking partteipants Is not 
stored in ttie preceding frame being considered or it is detennined at step SI 92 that none of the "potentiar speaking 
participants in ttie preceding firame being consWered is the same as a speaking participant unambiguously identified 

45 for the current frame. In this way. in cases where "potentiar speaking partfcipants were identified for a number of suc- 
cessive frames, the data stored in the meeting archive database is corrected if the actual speaking participant from 
among the "potential" speaking participants is identified in the next frame. 

[0135] Refening again to Rgure 8, at step SI 04, central controller 36 determines whether another frame of video 
data has been received from the cameras 2-1 , 2-2, 2-3. 
50 [0136] Steps S80 to SI 04 are repeatedly perfonmed while image data is received from the cameras 2-1 , 2-2, 2-3. 
[0137] When data is stored in meeting archive database 60, then the meeting archive database 60 may be interro- 
gated to retrieve data relating to the meeting. 

[01 38] Rgure 1 8 shows the processing operations performed to search the meeting archive database 60 to retrieve 
data relating to each part of the meeting which satisfies search criteria specified by a user. 
55 [01 39] Refening to Rgure 1 8, at step S200, centra! controller 36 causes display processor 64 to display a message 
on display device 26 requesting the user to enter information defining the search of meeting archive database 60 whfch 
is required. More particularly, in this embodiment, central controller 1 00 causes the display shown in Rgure 1 9A to 
appear on display devk» 26. 
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[0140] Referring to Rgure 1 9A, the user is requested to enter information defining the part or parts of the meeting 
which he wishes to find in the meeting archive database 60. More particularty, in this embodiment, the user is requested 
to enter information 400 defining a participant who was talking, information 410 comprising one or more Icey words 
which were said by the participant identified in information 400, and information 420 defining the participant or object at 

5 which the participant identified in infonmation 400 was looking when he was talking. In addition, the user is able to enter 
time infomnation defining a portion or portions of the meeting for which the search is to be can'ied out. More particularly, 
the user can enter infomnation 430 defining a time in the meeting beyond whtoh the search should be discontinued (that 
is, the period of the meeting before the specified time should be searched), information 440 defining a time in the meet- 
ing after which the search should be carried out, and infomnation 450 and 460 defining a start time and end time respec- 

70 tiveiy between whrch the search is to be carried out. In tiiis embodiment, information 430, 440, 450 and 460 may be 
entered either by specifying a time in absolute temns, for example in minutes, or in relative temis by entering a dedmal 
value whk:h indfcates a proportion of the total meeting time. For example, entering the value 0.25 as information 430 
would restrict the search to the first quarter of the meeting. 

[01 41 ] In this embodiment, the user is not required to enter all of the Information 400, 41 0 and 420 for one search, 

15 and instead may omit one or two pieces of this infomnation. If the user enters alt of the information 400, 41 0 and 420, 
then the search will be carried out to identify each part of the meeting in whfch the participant identified in information 
400 was talking to the partcipant or object identified in infomnation 420 and spoke the key words defined in information 
41 0. On ttie other hand, if infomnation 41 0 is omitted, then a search will be canried out to identify each part of the meet- 
ing in whch the partcipant defined in information 400 was talking to the participant or object defined in infonmation 420 

20 in^pective of what was said. If infomnation 41 0 and 420 is omitted, then a search is can-ied out to identify each part of 
the meeting in which the participant defined in information 400 was talking, irrespective of what was said and to whom. 
If information 400 is omitted, then a search is canied out to identify each part of the meeting in which any of the partfc- 
ipants spoke the key words defined in information 41 0 while looking at the participant or object defined in infomnation 
420. If information 400 and 41 0 is omitted, then a search is carried out to identify each part of the meeting in whfch any 

25 of the partrcipants spoke to the participant or object defined in information 420. If infomnation 420 is omitted, then a 
search is canried out to identify each part of the meeting in which the partfeipant defined in information 400 spoke the 
key words defined in information 410, irrespective of to whom the key words were spoken. Similarly, if infomnation 400 
and 420 is omitted, then a search is earned out to identify each part of the meeting in whfch the key words identified in 
information 41 0 were spoken, irrespective of who said the key words and to whom. 

30 [0142] In addition, the user may enter all of the time infomnation 430, 440, 450 and 460 or may omit one or more 
pieces of this infomnation. 

[0143] Further, known Boolean operators and search algorithms may be used in combination with key words 

entered in information 41 0 to enable the searcher to search for combinations or alternatives of words. 

[0144] Once the user has entered all of the required information to define the search, he begins the search by dick- 

35 ing on area 470 using a user input devk^ such as the nnouse 30. 

[0145] Referring again to Rgure 1 8, at step S202, the search infomnation entered by the user is read by central con- 
troller 36 and the instructed search is cam'ed out More partk^ulariy, in this embodiment, central controller 36 converts 
any participant or object names entered in infomnation 400 or 420 to identification numbers using the table 80 (Rgure 
4), and considers the text information 204 for the partk^iparrt defined in information 400 (or alt participants if information 

40 400 is not entered). If infomnation 420 has been entered by the user, then, for each period of text, central controller 36 
checks the data defining the corresponding viewing histogram to determine whether the percentage of viewing time in 
the histogram for the participant or object defined in information 420 is equal to or at)ove a threshold, whfch, in this 
embodiment, is 25%. In this way, periods of speech (text) are considered to satisfy the criteria that a participant defined 
in infomnation 400 was talking to the participant or object defined in infomnation 420 even if the speaking participant 

45 looked at other participanls or objects while speaking, provided that the speaking partk:ipant looked at the participant 
or object defined in infomnation 420 for at least 25% of the time of the speech. Thus, for example, a period of speech in 
which the value of the viewing histogram is equal to or above 25% for two or more participants would be identified if any 
of these participants were spedfied in infomnation 420. If the infomnation 41 0 has been input by the user, tiien central 
controller 36 and text searcher 62 search each portion of text previously identified on the basis of information 400 and 

50 420 (or all portions of text if infomnation 400 and 420 was not entered) to identify each portion containing tiie key word(s) 
identified in information 41 0. If any time infomnation has been entered by the user, then the searches described above 
are restricted to the meeting times defined by those limits. 

[0146] At step S204, central controller 36 causes display processor 64 to display a list of relevant speeches identi- 
fied during the search to the user on display devk:e 26. More particularty, central controller 36 causes infomnation such 
55 as that shown in Figure 1 9B to be displayed to the user. Referring to Rgure 1 98, a list is produced of each speech which 
satisfies the search parameters, and infomnation is displayed defining the start time for the speech botti in absolute 
temis and as a proportion of the full meeting time. The user is then at>ie to select one of the speeches for playt>ack, for 
example by dcking on the required speech in the list using the nnouse 30. 
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[0147] At step S206, central controller 36 reads the selection made by the user at step S204, and plays back the 
stored MPEG 2 data 202 for the relevant part of the meeting from meeting archive database 60. More particularly, cen- 
tral controller 36 and display processor 64 decode the MPEG 2 data 202 and output the image data and sound via dis- 
play device 26. If image data from more than one camera is stored for part, or the whole, of the speech to be played 
5 back, then this is indicated to the user on display device 26 and the user is able to select the image data which is to be 
replayed by inputting instructions to central controller 36. for example using keyboard 28. 

[01 48] At step S208. central controller 36 detemnines whether the user wishes to cease intemogating the meeting 
archive (^tak>ase 60 and. if not. steps S200 to S20d are repeated. 

[0149] Various modifications and changes can be made to the embodiment of the invention described above. 

10 [0150] For example, in the embodiment above a microphone anay 4 is provided on the meeting room table to deter- 
mine the direction from whfch received sound has come. However, instead, a respective microphone may be provided 
for each partfcipant in the meeting (such as a microphone whteh attaches to the clothing of the participant). In this way. 
the speaking partk:ipant(s) can be readily identified because the sound data for the participants is input into processing 
apparatus 24 on respective channels. 

15 [0151] In the embodiment above, at step S34 (Rgure 3) and step S70 (Rgure 7) the head of each of the partkHpants 
in the meeting is tracked. In addition, however, objects for which data was stored at step S4 and S26 could also be 
tracked if they moved (such objects may comprise, for example, notes whfeh are likely to be moved by a partteipant or 
an object whteh is to be passed between the participants). 

[0152] In the embodiment above, at step S168 (Rgure 13). processing is performed to identify the camera whkrh 
20 has the best view of the speaking participant and also the participant or object at whteh the speaking participant is look- 
ing. However, instead of identifying the camera in the way descn'bed in the emk>odiment above, it is possible for a user 
to define during the initialisation of processing apparatus 24 which of the cameras 2-1 . 2-2, 2-3 has the best view of 
each respective pair of the seating positions around the meeting table and/or ttie best view of each respective seating 
position and a given object (such as flip chart 14). In this way. if it is detemnined that the speaking partidpant and the 
25 participant at whom the speaking partfcipant is looking are in predefined seating positions, then the camera defined by 
the user to have the best view of those predefined seating positions can be selected as a camera from whbh Image 
data is to be stored. Similarty. if the speaking participant is in a predefined position and is looking at an object, then the 
camera defined by the user to have the best view of that predefined seating position and object can be selected as the 
camera from whk:h image data is to be stored. 
30 [0153] In the embodiment above, at step S162 (Rgure 13) a default camera is selected as a camera from whteh 
image data was stored for the previous frame. Instead, however, the default camera may be selected by a user, for 
example during the initialisation of processing apparatus 24. 

[0154] In the embodiment above, at step S1 94 (Rgure 1 7), the text data 204 is deleted from meeting archive data- 
base 60 for the "potential" speaking participants who have now been identified as actually not being speaking partici- 
36 pants. In addition, however, the associated viewing histogram data 214 may also be deleted. In addition, if MPEG 2 data 
202 from more than one of the cameras 2-1 . 2-2, 2-3 was stored, then the MPEG 2 data related to the "potential" speak- 
ing partbipants may also be deleted. 

[0155] In the embodiment atx>ve, when it is not possible to uniquely identify a speaking partteipant, "potential" 
speaking participants are defined, data is processed and stored in meeting archive database 60 for the potential speak- 

40 ing participants, and subsequently the data stored in the meeting archive database 60 is corected (step SI 02 in Rgure 
8). However, instead, rather than processing and storing data for "potential" speaking participants, video data received 
from cameras 2-1. 2-2 and 2-3 and audio data received from microphone array 4 may be stored for subsequent 
processing and archiving when the speaking participant has been identified from data relating to a future frame. Alter- 
natively, when the processing perfomied at step S1 14 (Rgure 12) results in an indfeation that there is more than one 

45 participant in the direction from whteh the speech is coming, image data from the cameras 2-1 . 2-2 and 2-3 may be 
processed to detect lip movements of the partfcipants and to select as the speaking partfcipant the partidpant in the 
direction from whteh the speech is coming whose lips are moving. 

[0156] In the embodiment above, processing is perfomied to detemnine the position of each person*s head, the ori- 
entation of each person's head and a vi wing parameter for each person defining at whom or what the person is look- 
so ing. The viewing parameter value for each person is then stored in the meeting archive database 60 for each frame of 
image data. However, it is not necessary to determine a viewing parameter for all of the people. For example, it is pos- 
sible to detemnine a viewing parameter for just the speaking partidpant, and to store just this viewing parameter value 
in the meeting archive database 60 for each frame of image data. Accordingly, in this case, it would be necessary to 
determine the orientation of only the speaking partidpanfs head. In this way, processing requirements and storage 
55 requirements can be reduced. 

[0157] In the embodiment above, at step S202 (Rgure 1 8), the viewing histogram for a particular portion of text is 
considered and it is detennined that the partteipant was talking to a further partteipant or object if the percentage of 
gaze time for the further partidpant or object in the viewing histogram is equal to or above a predetemnlned threshold. 
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Instead, however, rather than using a threshold, the participant or object at whom the speaking participant was looking 
during the period of text (speech) may be defined to be the participant or object having the highest percentage gaze 
value in the viewing histogram (for example participant 3 in Rgure 16A, and partidpant 1 in Figure 16B). 
[01581 in the embodiment above, the MPEG 2 data 202, the text data 204. the viewing parameters 212 and the 
5 viewing histogranns 214 are stored in meeting arch we database 60 in real time as data is received from cameras 2-1 . 
2-2 and 2-3 and mfcrophone array 4. However, instead, the video and sound data may be stored and data 202, 204, 
212 and 214 generated and stored in meeting archive database 60 in non-real-time. 

[0159] In the embodiment above, the i\4PEG 2 data 202, the text data 204. the viewing parameters 212 and the 
viewing histograms 214 are generated and stored in the meeting archive database 60 before the database is intem>- 

10 gated to retrieve data for a defined part of the meeting. However, some, or ail, of the viewing histogram data 214 may 
be generated in response to a search of the meeting archive database 60 being requested by the user by processing 
the data already storad in meeting archive database 60, rather than being generated and stored prior to such a request. 
For example, although in the embodiment above the viewing histograms 214 are cateulated and stored in real-time at 
steps S98 and S100 (Figure 8), these histograms could be calculated in response to a search request being input t>y 

IS the user. 

[0160] In the embodiment at)ove, text data 204 is stored in meeting archive database 60. Instead, audio data may 
be stored in the meeting archive database 60 instead of the text data 204. The stored audio data would then either itself 
be searched for key words using voice recognition processing or converted to text using vofce recognition processing 
and the text search using a conventionat text searcher. 

20 [0161] In the embodiment above, processing apparatus 24 includes functional components for receiving and gen- 
erating data to be archived (for example, central controller 36, head tracker 50, head model store 52, direction proces- 
sor 53. vofce recognition processor 54. speech recognition parameter store 56 and archive processor 58), functional 
components for storing the archive data (for example meeting archive database 60), and also functional components 
for searching the database and retrieving infonmation therefrom (for example central controller 36 and text searcher 62). 

25 However, these functional components may be provided in separate apparatus. For example, one or more apparatus 
for generating data to be archived, and one or more apparatus for database searching may be connected to one or 
more databases via a network, such as the Internet. 

[0162] Also, referring to Figure 20. video and sound data from one or more meetings 500, 510, 520 may be input 
to a data processing and database storage apparatus 530 (whbh comprises functional components to generate and 

30 store the archive data), and one or more database interrogation apparatus 540, 550 may be connected to the data 
processing and database storage apparatus 530 for interrogating the database to retrieve information therefrom. 
[0163] In the embodiment above, processing is performed by a computer using processing routines defined by pro- 
gramming instructions. However, some, or all, of the processing could be perfomned using hardware. 
[0164] Although the embodiment above is described with respect to a meeting taking place between a number of 

35 participants, the invention is not limited to this appftcation, and. instead, can be used for other applications, such as to 
process image and sound data on a film set etc. 

[0165] Different combinations of the above modifications are. of course, possible and other changes and modifica- 
tions can be made without departing from the spirit and scope of the invention. 

40 Claims 

1. Image processing apparatus, comprising: 

means for receiving image data recorded by a plurality of cameras showing the movements of a plurality of 
45 people; 

speaker identification means for determining whk:h of the people is speaking; 
means for determining at whom the speaker is looking; 

means for determining the position of the speaker and the position of the person at whom the speaker is look- 
ing; and 

50 camera selection means for selecting image data from the received image data on the basis of the detemnined 

positions of the speaker and the person at whom the speaker is looking. 

2. Apparatus according to claim 1 . wherein the camera selection means is arranged to select image data in which 
both the speaker and the person at whom the speaker is looking appear. 

55 

3. Apparatus according to daim 2, wherein the camera selection means is arranged to generate quality values repre- 
senting a quality of the views that at least some of the cameras have of the speaker and the person at whom the 
speaker is looking, and to select the image data on the basis of which camera has the quality value representing 
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the highest quality. 

4. Apparatus according to daim 3, wherein the camera selection means is ananged to determine which of the cam- 
eras have a view of the speaker and the person at whom the speaker is looking, and to generate a respective qual- 

5 ity value for each camera which has a view of the speaker and the person at whom the speaker is looking. 

5. Apparatus according to claim 3 or claim 4, wherein the camera selection means is arranged to generate each qual- 
ity value in dependence upon the position and orientation of the head of the speaker and the position and orienta- 
tion of the head of the person at whom the speaker is looking. 

10 

6. Apparatus according to claim 1 or daim 2, wherein the camera selection means comprises: 

data storage means for storing data defining a camera from whrch image data is to be selected for respective 
pairs of positions; and 

15 means for using data stored in the data storage means to select the image data in depender^ce upon the posi- 

tions of the speaker and the person at whom the speaker is looking. 

7. Apparatus according to any preceding daim, wherein the means for determining at whom the speaker is looking 
and the means for determining the positions of the speaker and the person at whom the speaker is looking corrv 

20 prise image processing means for processing the image data from at least one of the cameras to determine at 
whom the speaker is looking and the positions. 

8. Apparatus according to claim 7, wherein the image processing means is arranged to determine the position of each 
person and at whom each person is looking by processing the image data from the at least one camera. 

25 

9- Apparatus according to claim 7 or daim 8, wherein the image processing means is ananged to track the position 
and orientation of each person's head in three dimensions. 

10- Apparatus according to any preceding daim, wherein the speaker identification means is arranged to receive 
30 speech data from a plurality of microphones each of whk^h is allocated to a respective one of the people, and to 

determine which of the people is peaking on the basis of the microphone from whrch the speech data was 
received. 

11. Apparatus according to any preceding claim, further comprising sound processing means for processing sound 
35 data defining words spoken by the people to generate text data therefrom in dependence upon the resuit of the 

processing performed by the speaker identification means. 

1 2. Apparatus according to claim 1 1 , wherein the sound processing means indudes storage means for storing respec- 
tive voice recognition parameters for each of the people, and means for selecting the vok» recognition parameters 

40 to be used to process the sound data in dependence upon the person detemnined to be speaking by the speaker 
identification means. 

13. Apparatus according to claim 11 or claim 12, further comprising a database for storing at least some of the received 
image cteta, the sound data, the text data produced by tfie sound processing means and viewing data defining at 

45 whom at least the person who is speaking is looking, the database being arranged to store the data such that cor- 
responding text data and viewing data are associated with each other and with the corresponding image data and 
sound data. 

14. Apparatus according to daim 1 3, further comprising means for compressing the image data and the sound data for 
50 storage in the database. 

15. Apparatus according to daim 14, wherein the means for compressing the image data and the sound data com- 
prises means for encoding the image data and the sound data as MPEG data. 

55 16- Apparatus according to any of daims 1 3 to 1 5, further comprising means for generating data defining, for a prede- 
termined period, the proportion of time spent by a given person looking at each of the other people during the pre- 
determined period, and wherein the database is arranged to store the data so that it is assodated with the 
corresponding image data, sound data, text data and viewing data. 
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1 7. Apparatus according to claim 1 6. wherein the predetermined period comprises a period during which the given per- 
son was talldng. 

18. Image processing apparatus, comprising: 

means for receiving image data recorded by a plurality of cameras showing the movements of a pluraiity of 
people; 

speaker identification means for detemiining which of the people is speaking; 
means for detemnining at what the speaker is looking; 

means for detemnining the position of the speaker and the position of the object at whfeh the speaker is looking; 
and 

camera selection means for selecting image data from the received image data on the basis of the detemnined 
positions of the speaker and the object at which the speaker is looking. 

1 9. A method of processing image data recorded by a plurality of cameras showing the movements of a plurality of peo- 
ple to select image data for storage, the method comprising: 

a speaker identification step of determining which of the people is speaking; 
a step of determining at whom the speaker is looking; 

a step of determining the position of the speaker and the position of the person at whom the speaker is looking; 
and 

a camera selection step of selecting Image data on the basis of the detemnined positions of the speaker and 
the person at whom the speaker is looking. 

20. A method according to daim 19, wherein, in the camera selection step, image data is selected in whbh both the 
speaker and the person at whom the speaker is looking appear. 

21. A method according to claim 20, wherain, in the camera selection step, quality values are generated representing 
a quaiity of the views that at least some of the cameras have of the speaker and the person at whom the speaker 
is looking, and the image data is selected on the basis of whk:h camera has the quality value representing the high- 
est quality. 

22. A method according to claim 21 , wherein, in the camera selection step, processing is performed to detemnine which 
of the cameras have a view of the speaker and the person at whom the speaker is looking, and to generate a 
respective quality value for each camera which has a view of the speaker and the person at whom the speaker is 
looking. 

23. A method according to claim 21 or daim 22, wherein, in the camera selection step, each quality value is generated 
in dependence upon the position and orientation of the head of the speaker and the position and orientation of the 
head of the person at whom the speaker is looking. 

24. A method according to claim 1 9 or claim 20, wherein, in the camera selection step pre-stored data defining a cam- 
era from which image data is to be selected for respective paira of positions is used to select the image data in 
dependence upon the positions of the speaker and the person at whom the speaker is looking. 

25. A niethod according to any of claims 1 9 to 24, wherein, in ttie steps of detenmining at whom the speaker is looking 
and determining the positions of the speaker and the person at whom the speaker is looking, image data from at 
least one of the cameras is processed to determine at whom the speaker is looking and the positions. 

28. A method according to claim 25, wherein, the image data from that at least one camera is processed to detemnine 
the position of each person and at whom each person is looking. 

27. A method according to claim 25 or daim 26, wherein image data is processed to track the position and orientation 
of each person's head in three dimensions. 

28. A method according to any of dalms 19 to 26, wherein speech data is receh^ed from a plurality of microphones 
each of whch is allocated to a respective one of the people, and, in the weaker identifcation step, it s determined 
whteh of the people is speaking on the basis of the mfcrophone from which the speech data was received. 
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29. A method according to any of claims 19 to 28, further comprising a sound processing step of processing sound 
data defining words spoken by the people to generate text data therefrom in dependence upon the result of the 
processing perfomned in the speaker identification step. 

30. A method according to daim 29, wherein the sound processing step includes selecting, from among stored respec- 
tive vofce recognition parameters for each of the people, the votee recognition parameters to be used to process 
the sound data in dependence upon the person determined to be speaking in the speaker identification step. 

31- A method according to daim 29 or daim 30, further comprising the step of storing in a datat)ase at least some of 
the received image data, the sound data, the text data produced in the sound processing step and viewing data 
defining at whom at least the person who is speaking is looking, the data being stored in the database such that 
con-esponding text data and viewing data are assodated with each other and with the con'esponding image data 
and sound data. 

32. A method according to claim 31 , wherein the image data and the sound data are stored in the database in com- 
pressed form. 

33. A metiiod according to daim 32. wherein the image data and the sound data are stored as MPEG data. 

34. A method according to any of daims 31 to 33, further comprising the steps of generating data defining, for a pre- 
detemnined period, the proportion of time spent by a given person looking at each of the other people during the 
predetermined period, and storing the data in the database so that it is assodated with the corresponding image 
data, sound data, text data and viewing data. 

35. A method according to claim 34, wherein the predetermined period comprises a period during whfch the given per- 
son was talking. 

36. A method according to any of daims 1 9 to 35, further comprising the step of generating a signal conveying infor- 
mation defining the image data selected in the camera selection step. 

37. A method according to any of daims 31 to 35, further comprising the step of generating a signal conveying the 
database with data therein. 

38. A method according to daim 37, further comprising the step of recording the signal eitfier directty or indirectly to 
generate a recording thereof. 

39. A method of processing image data recorded by a plurality of cameras showing the movements of a plurality of peo- 
ple to select image data for storage, the method comprising: 

a speaker identification step of detemnining which of the people is speaking; 
a step of determining at what the speaker Is looking; 

a step of determining the position of the speaker and the position of the object at which ttie speaker is looking; 
and 

a camera selection step of selecting image data on the basis of the determined positions of the speaker and 
the object at whk:h the speaker is looking. 

40. A storage device storing instructions for causing a programmable processing apparatus to become configured as 
an apparatus as set out in at least one of dainns 1 to 1 8. 

41 . A storage device storing instructions for causing a programmable processing apparatus to become operable to per- 
form a method as set out in at least one of daims 1 9 to 39. 

42. A signal conveying instructions for causing a programmable processing apparatus to become configured as an 
apparatus as set out in at least one of daims 1 to 1 8. 

43. A signal conveying instructions for causing a programmable processing apparatus to become operable to perform 
a method as set out in at least one of daims 1 9 to 39. 
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FIG. 3 
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NUMBERS AND NAMES IN THE MEETING 
ARCHIVE DATABASE 
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REQUEST THE NEXT PARTICIPANT TO SIT 
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GENERATE A VIEWING PARAMETER FOR 
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SI 04 
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YES 



F/G. 8 {cont) 
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FIG. 9 
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READ THE CURRENT 30 POSITION OF EACH 
PARTICIPANTS HEAD 


^S110 




^ 






r 




READ THE CURRENT ORIENTATION OF THE 
NEXT PARTICIPANTS HEAD 


^S112 




r 





DETERMINE THE ANGLE BETWEEN THE 
VIEWING RAY OF THE PARTlCIPANt AND 
EACH NOTIONAL UNE CONNECTING THE 
HEAD OF THE PARTICIPANT WITH THE HEAD 
OF ANOTHER PARTICIPANT 



S1 14 



1 




SELECT THE SMALLEST ANGLE 



V 



S116 




SET THE VIEWING PARAMETER R3R 
THE PARTICIPANT TO THE NUMBER OF 
THE PARTiaPANT CONNECTED BY 
THE NOTIONAL UNE WHICH MAKES 
THE SMALLEST ANGLE WITH THE 
VIEWING RAY 



S120 



® (B) 




S122 
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1 



SET THE VIEWING PARAMETER 
FOR THE PARTICIPANT TO THE 

NUMBER OF THE NEAREST 
OBJECT TO THE PARTICIPANT 
WHICH IS INTERSECTED BY THE 
VIEWING RAY 



'S126 



S128 



SET THE VIEWING 
PARAMETER FOR THE 
PARTICIPANT TO •O" 




FIG. 9 (cont) 
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PROCESS DATA FROM THE MICROPHONE 
ARRAY TO DETERMINE THE DIRECTION(S) 
FROM WHICH THE SPEECH IS COMING 



S140 



USE THE CALCULATED HEAD POSITIONS TO 
DETERMINE WHICH PARTICIPANT(S) IS/ARE 
PRESENT IN THE DIRECTION FROM WHICH 
THE SPEECH IS COMING 



S142 



S146 




SELECT THE PREVIOUSLY 
IDENTIFIED SPEAKER AS 
THE SPEAKER FOR THE 
CURRENT FRAME 



SELECT EACH 
PARTICIPANT IN THE 
DIRECTION FROM WHICH 
THE SPEECH IS COMING AS 
A -POTENTIAL" SPEAKING 
PARTICIPANT 



FIG. 12 
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NO 



YES 



S162 



SELECT DEFAULT 
CAMERA(S) AS THE 
CAMERA(S) FROM 
WHICH IMAGE DATA IS 
TO BE STORED 



READ VIEWING PARAMETER FOR THE NEXT 
SPEAKING PARTICIPANT TO DETERMINE AT 
WHOM OR WHAT THEY ARE LOOKING 



S164 



READ HEAD POSITION AND ORIENTATION 

FOR THE SPEAKING PARTICIPANT 
TOGETHER WITH THE HEAD POSITION AND 
ORIENTATION OF THE PARTICIPAm" BEING 
SPOKEN TO OR THE POSITION AND 
ORIENTATION OF THE OBJECT BEING 
SPOKEN TO 



S166 



DETERMINE THE CAMERA WHICH BEST 
SHOWS THE SPEAKING PARTICIPANT 
AND THE PARTICIPANT BEING SPOKEN 
TO OR THE OBJECT BEING SPOKEN TO. 

AND SELECT THIS CAMERA AS A 
CAMERA FROM WHICH IMAGE DATA IS 
TO BE STORED 
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YES 



ANOTHER 
SPEAKING PARTICIPANT 
OR -POTENTIAL" SPEAKING 
PARTICIPANT? 
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NO 
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READ POSITION AND DIRECTION OF NEXT 
CAMERA 
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CAN THE "^^^5178 
CAMERA SEE BOTH 
THE SPEAKING PARTICIPANT 
AND THE PARTICIPANT OR OBJECT 
AT WHICH THE SPEAKING 
PARTICIPANT IS 
LOOKING? 



YES 



CALCULATE AND STORE VALUE 
REPRESENTING THE QUAUTY OF THE VIEW 
THAT THE CAMERA HAS OF THE SPEAKING 
PARTICIPANT 



S180 



CALCULATE AND STORE VALUE 
REPRESENTING THE QUALITY OF THE VIEW 
THAT THE CAMERA HAS OF THE 
PARTICIPANT OR OBJECT AT WHICH THE 
SPEAKING PARTICIPANT IS LOOKING 



COMPARE THE QUAUTY VALUES OF THE 
SPEAKING PARTICIPANT AND THE 

PARTICIPANT OR OBJECT AT WHICH THE 
SPEAKING PARTICIPANT IS LOOKING, AND 
STORE THE VALUE FOR THE WORST VIEW 



COMPARE THE STORED n/VORST VIEW- 
VALUES. AND SELECT THE CAMERA WHICH 
HAS THE BEST "WORST VIEW" VALUE AS A 
CAMERA FROM WHICH IMAGE DATA SHOULD 
BE STORED IN THE MEETING ARCHIVE 
DATABASE 



T 
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S184 




S188 



FIG. 
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PERCENTAGE 
GAZET«ME 



100- 



50- 




FIG. 16A 



PERCENTAGE 
GAZE TIME 

loo- 



se - 



320 




340 



1 I 2 I 3 I 4" 
PARTICIPANT/OBJECT 




FIG. 16B 
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PRECEDING FRAME BEING CONSIDERED 
FROM THE MEETING ARCHIVE DATABASE 

FOR EACH "POTENTIAL" SPEAKING 
PARTICIPANT WHO IS NOT THE SAME AS 
THE SPEAKING PARTICIPANT FOR THE 
CURRENT FRAME 



r 



FIG 
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C 



START 



1 



PROMPT USER TO ENTER 
SEARCH INFORMATION 



S200 



READ SEARCH INFORMATION 
AND PERFORM SEARCH 



S202 



DISPLAY UST OF REUEVANT 
SPEECHES AND PROMPT USER 
TO SELECT ONE 



S204 



READ SELECTION AND 
PLAYBACK AUDIO AND VISUAL 
DATA FOR THE SELECTION 
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NO 



FIG. 18 



40 



EP1 045 586 A2 



Please enter search parameters 

^400 ^410 ^420 

I I talking about | { to | | 



Time limits: Before | ^ 430 
After \ 

Between | ~1 and | ~j 

^450 ^460 



470 



^ START ) 



FIG. 19 A 



The following parts of the meeting are relevant Please 
select one for playback: 

1. Speech starting at 10 mins 0 sees (0.4 x full meeting time) 

2. Speech starting at 12 mins 30 sees (0^ x full meeting time) 



FIG. 19 B 
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DATA PROCESSING 
AND 

DATABASE STORAGE 
APPARATUS 



FIG. 20 




530 




DATABASE 
INTEROGATION 
APPARATUS 



5^0 



DATABASE 
INTEROGATION 
APPARATUS 



•550 



42 



